Improving Topic Model Stability for Effective Document Exploration

نویسندگان

  • Yi Yang
  • Shimei Pan
  • Yangqiu Song
  • Jie Lu
  • Mercan Topkara
چکیده

Topic modeling has become a ubiquitous topic analysis tool for text exploration. Most of the existing works on topic modeling focus on fitting topic models to input data. They however ignore an important usability issue that is closely related to the end user experience: stability. In this study, we investigate the stability problem in topic modeling. We first report on the experiments conducted to quantify the severity of the problem. We then propose a new learning framework to mitigate the problem by explicitly incorporating topic stability constraints in model training. We also perform user study to demonstrate the advantages of the proposed method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Improving Thermal Stability of Starch in Formate Fluids for Drilling High Temperature Shales

Starch is one of the most widely used biopolymers in water based drilling fluids to control fluid loss. The thermal stability of starch in common drilling fluids is low (93 °C). In this study, the thermal stability of starch has been evaluated in sodium/potassium formate and potassium chloride fluids. Samples of mud were prepared by formate salts (sodium and potassium) and potassium chloride wi...

متن کامل

A Topic-Based Search, Visualization, and Exploration System

From literature surveys to legal document collections, people need to organize and explore large amounts of documents. During these tasks, students and researchers will search for documents based on particular themes. In this paper, we use a popular topic modeling algorithm, Latent Dirichlet Allocation, to derive topic distributions for articles. We allow users to specify personal topic distrib...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016